Partition-Based Clustering in Object Bases: From Theory to Practice
نویسندگان
چکیده
We classify clustering algorithms into sequence-based techniques|which transform the object net into a linear sequence|and partition-based clustering algorithms. Tsangaris and Naughton [TN91, TN92] have shown that the partition-based techniques are superior. However, their work is based on a single partitioning algorithm, the Kernighan and Lin heuristics, which is not applicable to realistically large object bases because of its high running-time complexity. The contribution of this paper is two-fold: (1) we devise a new class of greedy object graph partitioning algorithms (GGP) whose running-time complexity is moderate while still yielding good quality results. For large object graphs GGP is the best known heuristics with an acceptable running-time. (2) We carry out an extensive quantitative analysis of all well-known partitioning algorithms for clustering object graphs. Our analysis yields that no one algorithm performs superior for all object net characteristics. Therefore, we derive a multi-dimensional grid: the dimensions correspond to particular characteristics of the object base con gurations and the grid entries indicate the best clustering algorithm for the particular con guration. We propose an adaptable clustering strategy by determining rst the characteristics of the clustering problem|given by, e.g., number and size of objects, degree of object sharing|and then applying the most suitable algorithm which is obtained from the multi-dimensional grid.
منابع مشابه
Partition - Based Clustering in Object Bases : From Theory to
We classify clustering algorithms into sequence-based tech-niques|which transform the object net into a linear sequence|and partition-based clustering algorithms. Tsangaris and Naughton TN91, TN92] have shown that the partition-based techniques are superior. However , their work is based on a single partitioning algorithm, the Kernig-han and Lin heuristics, which is not applicable to realistica...
متن کاملA partition-based algorithm for clustering large-scale software systems
Clustering techniques are used to extract the structure of software for understanding, maintaining, and refactoring. In the literature, most of the proposed approaches for software clustering are divided into hierarchical algorithms and search-based techniques. In the former, clustering is a process of merging (splitting) similar (non-similar) clusters. These techniques suffered from the drawba...
متن کامل3D Scene and Object Classification Based on Information Complexity of Depth Data
In this paper the problem of 3D scene and object classification from depth data is addressed. In contrast to high-dimensional feature-based representation, the depth data is described in a low dimensional space. In order to remedy the curse of dimensionality problem, the depth data is described by a sparse model over a learned dictionary. Exploiting the algorithmic information theory, a new def...
متن کاملپهنهبندی پیوسته هدایت الکتریکی- اسیدیته خاک بر اساس خوشهبندی فازی برای دشت قم
Electrical conductivity and acidity of soil are the most important chemical factors of soil for agriculture. The nature of soil is in such a way that its change has a continuous form. The method that can take into account this continuity will be able to show a better picture of change in soil characteristics. Objectives of this research are to investigate the relations between measured electric...
متن کاملClustering in Object Bases
We investigate clustering techniques that are speci cally tailored for object-oriented database systems. Unlike traditional database systems object-oriented data models incorporate the application behavior in the form of type-associated operations. This source of information is exploited for clustering decisions by statically determining the operations' access behavior applying data ow analysis...
متن کامل